htlcswitch: separate onchain and offchain intercpeted HTLCs cleanly#10895
Conversation
PR Severity: CRITICAL
Critical (2 files):
Medium (1 file):
Low (4 files - excluded from severity bump counts):
AnalysisThis PR modifies core htlcswitch package files. Changes to held_htlc_set.go and interceptable_switch.go affect the HTLC forwarding and payment routing state machine, warranting expert review. Bump check: 4 non-test files (threshold >20), 467 lines changed (threshold >500), 1 critical package. No bump applied. To override, add a severity-override-{critical,high,medium,low} label. |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request refactors the HTLC interception mechanism to accurately manage HTLCs that transition from off-chain forwarding to on-chain resolution, particularly in scenarios involving channel force-closures. By introducing distinct handling for off-chain and on-chain held HTLCs, the changes prevent incorrect auto-failing or misdirection of settlement attempts, thereby enhancing the robustness and reliability of HTLC management within the system. Highlights
New Features🧠 You can now enable Memory (public preview) to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request refactors the heldHtlcSet to distinguish between off-chain and on-chain held HTLCs using a new heldEntry interface, ensuring proper handling of resolutions and expirations for both flows. It also updates the InterceptableSwitch and witness_beacon to support this, and adds comprehensive unit and integration tests. A critical deadlock risk was identified in witness_beacon.go where SubscribeUpdates holds a lock while making a blocking call to the interceptor, which can conflict with the switch's event loop attempting to acquire the same lock during resolution; releasing the lock early is recommended.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
cf2e20c to
02347b0
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request refactors the heldHtlcSet to track held HTLCs as either off-chain or on-chain entries using a new heldEntry interface. This change ensures that on-chain re-offers can replace old off-chain holds, allowing settlements to reach the witness beacon after an incoming channel force-closes. Additionally, integration tests are added to verify on-chain settlement scenarios, and minor fixes are made to the witness beacon subscription flow. The reviewer's feedback correctly points out a style guide violation where the newHeldHtlcSet function lacks a documentation comment.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
02347b0 to
6495243
Compare
5e3c3c3 to
424d1a8
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request refactors the heldHtlcSet to support both off-chain and on-chain held HTLCs via a new heldEntry interface, resolving an issue where on-chain forward interceptor settlements failed to reach the witness beacon after a channel force-close. The changes also include updates to InterceptableSwitch and witness_beacon.go, comprehensive unit and integration tests, and updated documentation. Feedback is provided to add a documentation comment to newHeldHtlcSet to align with the repository's style guide.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
4768feb to
ab5dbca
Compare
|
I think the promotion path should notify the connected interceptor with the updated on-chain intercept. Step-by-step case:
Severity: blocking. This is in the PR's target no-restart path. The internal held entry is fixed, but the live interceptor client is left with stale off-chain state. For a pure held-before-forward router this may be a failed payment/opportunity loss, but for an interceptor-backed service with external accounting, or a case where Bob already paid/settled downstream, this can become direct funds loss because Bob fails to claim upstream. This issue is introduced by this PR because it changes the internal held entry from off-chain to on-chain, but intentionally suppresses the duplicate notification for already-held circuit keys. That duplicate notification carries a real state transition. Suggested fix: re-send the on-chain packet on promotion. The proto docs already say repeated circuit keys must be handled idempotently. func (s *InterceptableSwitch) interceptOnChain(fwd InterceptedForward) error {
if err := s.heldHtlcSet.addOnChain(fwd); err != nil {
return err
}
if s.interceptor != nil {
s.sendForward(fwd)
}
return nil
}The unit test should use different off-chain and on-chain deadlines and assert that the promoted on-chain packet is delivered: offChain := newMockInterceptedForward(key, 80)
onChain := newMockOnChainInterceptedForward(key, 100)
require.NoError(t, s.heldHtlcSet.addOffChain(offChain))
require.NoError(t, s.interceptOnChain(onChain))
require.Len(t, intercepted, 1)
require.Equal(t, key, intercepted[0].IncomingCircuit)
require.Equal(t, int32(100), intercepted[0].AutoFailHeight()) |
So the idea was to not change behavior to much from the client perspective in case they become confused why they get the same fwd package again, and would basically discard it, I think it makes sense to maybe add this with the RPC change which introduces the deadline timeout as requested here: #10921, but I can of course change it, just let me know what your final opinions are. |
gijswijs
left a comment
There was a problem hiding this comment.
I like this PR a lot. It creates a clean separation of off-chain and on-chain interceptor handling. The held_htlc_set.go now reads much cleaner. I did find some small nits and issues that I've commented on in-line.
Add coverage for held forwards that move on chain after the incoming channel force closes. The restart case exercises the path where Bob loses the in-memory held set and contractcourt re-offers the HTLC through the witness beacon. The no-restart case keeps the original off-chain hold and proves that settlement must still reach the on-chain resolver.
0836c8d to
b689fe6
Compare
Store held forwards as off-chain or on-chain entries instead of a raw InterceptedForward map. Off-chain entries keep the existing resume, fail, settle and auto-fail behavior. On-chain entries are settle-only and expire by pruning local interceptor state. When contractcourt re-offers a circuit that is already held off-chain, replace the stored entry with the on-chain forward so a later SETTLE reaches the witness beacon instead of the old link mailbox path. Also set the on-chain interceptor deadline to the HTLC refund timeout. This keeps the public interceptor deadline populated while ensuring only off-chain held entries use that value to fail back. Only off-chain held HTLCs can be released when an optional interceptor disconnects, because they can resume into the link forwarding flow. On-chain held HTLCs have no link flow to resume. Keep them in the held set so a reconnecting interceptor can replay and settle them while contractcourt waits for the preimage or on-chain expiry. Use distinct internal deadline types for off-chain auto-fail heights and on-chain settlement deadlines instead of overloading the intercepted packet field. Project both variants back into the existing router RPC auto_fail_height field to preserve wire compatibility. Reject mismatched held HTLC deadline types in tests. On-chain intercepted HTLCs can only be settled. Resume and fail actions already return concrete errors through the on-chain intercepted forward, so let those errors propagate to the interceptor client instead of converting them to success. Keep the held entry tracked on these errors so the client can reconnect and settle the HTLC later.
Release the preimage beacon lock before invoking the on-chain interceptor. The interceptor path can block on the htlcswitch event loop, while resolution of another held on-chain HTLC can call back into the beacon to add a preimage. If interceptor delivery fails after the subscriber was registered, cancel the subscription before returning the error. On-chain held entries are replay handles for the interceptor while contractcourt waits for a preimage or on-chain expiry. Once the resolver tears down, keeping the handle until the refund timeout can replay a stale HTLC to a reconnecting interceptor. Thread a dedicated cleanup signal from the witness subscription cancel path back through the interceptable switch event loop. The held set only removes on-chain entries for that signal, leaving off-chain entries under the link flow lifecycle.
routerrpc: document on-chain interceptor responses
b689fe6 to
9c5f32a
Compare
Updated it but slightly more conservative:
But it does not send for:
|
gijswijs
left a comment
There was a problem hiding this comment.
Nice. All my comments are resolved.
LGTM! 🛡️
|
Follow-up/non-blocking robustness note: Step-by-step scenario:
Consequence: resolver cleanup can be delayed, stale on-chain held entries can remain in memory longer than intended, and a resolver goroutine can remain blocked until the interceptor stream unblocks or the switch shuts down. This is not a remote Lightning peer exploit; it is a robustness issue with an operator-controlled interceptor service. Severity: medium/low follow-up. The PR fixes the funds-loss path, so I would not block on this, but the cleanup path should ideally not synchronously depend on the same switch loop that performs blocking interceptor delivery. Minimal follow-up fix shape: decouple resolver cleanup from the synchronous switch-loop rendezvous, for example by making the cancellation callback enqueue cleanup asynchronously: CancelSubscription: func() {
p.Lock()
delete(p.subscribers, clientID)
close(client.quit)
p.Unlock()
go func() {
err := p.cancelInterceptor(inKey)
if err != nil {
srvrLog.Errorf("Cannot remove on-chain "+
"intercept %v: %v", inKey, err)
}
}()
},A stronger design would decouple |
|
Created backport PR for
Please cherry-pick the changes locally and resolve any conflicts. git fetch origin backport-10895-to-v0.20.x-branch
git worktree add --checkout .worktree/backport-10895-to-v0.20.x-branch backport-10895-to-v0.20.x-branch
cd .worktree/backport-10895-to-v0.20.x-branch
git reset --hard HEAD^
git cherry-pick -x 9b31ba83ef52e2d2800f53564f44b9efd4a75ca3 eb1193f80bbe5f0ec2f6618657f1deada5f6561c 98da7b4a56a75ba2dbf47391d43229fd9696e98a 8909c2fbf5535b81ee18c4f4e7fe0160ca43e725 9c5f32a2ec8eab0760a46c3d0357d4127794425f
git push --force-with-lease |
|
Successfully created backport PR for |
…21.x-branch [v0.21.x-branch] Backport #10895: htlcswitch: separate onchain and offchain intercpeted HTLCs cleanly
Change Description
Fixes #10892
Look at the first commit which shows via itests the failure cases of the old Interceptor Implementation
The second commit introduces a new interface and distingishes between onchain and offchain HTLC for the interceptor. It does not change any public interface.